{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "### License\n",
    "Copyright 2021 Google LLC\n",
    "\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
    "\n",
    "  http://www.apache.org/licenses/LICENSE-2.0\n",
    "\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 1. Installing Dependencies\n",
    "\n",
    "Run this file to install all dependencies needed by the Notebook. You can skip running this cell if you have runned this cell in the current environment previously."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install -r requirements.txt\n",
    "!jupyter nbextension enable --py widgetsnbextension\n",
    "!jupyter serverextension enable voila --sys-prefix"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 2. Setting Environment Variables\n",
    "<ul>\n",
    "    <li> Make sure you have set your environment vairables in `var.env` file.\n",
    "    <li> Pick the date range for your analysis\n",
    "    <li> After resetting any environment variables, you need to restart the kernel because otherwise it will not be loaded by Jupyter. To restart, go to the menu 'Kernel' and choose 'Restart'\n",
    "    <li> Run all the cells in this section\n",
    "    <li> Make sure the environment variables are set correctly\n",
    "<ul>"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv('var.env')\n",
    "\n",
    "import os\n",
    "from itables import init_notebook_mode\n",
    "if os.getenv('IS_INTERACTIVE_TABLES_MODE') == 'TRUE':\n",
    "    init_notebook_mode(all_interactive=True)\n",
    "\n",
    "from IPython.display import display\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import HBox\n",
    "start_date_picker = widgets.DatePicker(description='Start Date')\n",
    "end_date_picker = widgets.DatePicker(description='End Date')\n",
    "date_pickers = HBox(children=[start_date_picker, end_date_picker])\n",
    "display(date_pickers)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "os.environ['START_DATE'] = str(start_date_picker.value)\n",
    "os.environ['END_DATE'] = str(end_date_picker.value)\n",
    "print('--- LOADED ENVIRONMENT VARIABLES ---')\n",
    "print(f\"INPUT_PROJECT_ID: {os.getenv('INPUT_PROJECT_ID')}\")\n",
    "print(f\"INPUT_DATASET_ID: {os.getenv('INPUT_DATASET_ID')}\")\n",
    "print(f\"INPUT_AUDIT_LOGS_TABLE_ID: {os.getenv('INPUT_AUDIT_LOGS_TABLE_ID')}\")\n",
    "print(f\"IS_AUDIT_LOGS_TABLE_PARTITIONED: {os.getenv('IS_AUDIT_LOGS_INPUT_TABLE_PARTITIONED')}\")\n",
    "print(f\"OUTPUT_PROJECT_ID: {os.getenv('OUTPUT_PROJECT_ID')}\")\n",
    "print(f\"OUTPUT_DATASET_ID: {os.getenv('OUTPUT_DATASET_ID')}\")\n",
    "print(f\"OUTPUT_TABLE_SUFFIX: {os.getenv('OUTPUT_TABLE_SUFFIX')}\")\n",
    "print(f\"LOCATION: {os.getenv('LOCATION')}\")\n",
    "print(f\"START_DATE: {os.getenv('START_DATE')}\")\n",
    "print(f\"END_DATE: {os.getenv('END_DATE')}\")\n",
    "print(f\"IS_INTERACTIVE_TABLES_MODE: {os.getenv('IS_INTERACTIVE_TABLES_MODE')}\")"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 3. Creating Tables for Current Analysis Environment\n",
    "\n",
    "Run the cell below to create tables that is necessary for the analysis"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "from src.bq_query import BQQuery\n",
    "try:\n",
    "    BQQuery.create_functions_for_pipeline_analysis()\n",
    "    BQQuery.create_tables_for_pipeline_analysis()\n",
    "except Exception as e:\n",
    "    print('Unable to create tables, do not continue with the analysis')\n",
    "    print(e)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 4. Getting Analysis Result\n",
    "\n",
    "### Get the tables with highest discrepancy on write vs read frequency throughout the data warehouse\n",
    "This will list down tables with the highest discrepancy on write vs read frequency.\n",
    "\n",
    "1. Run the cell\n",
    "2. Set the limit on how many tables you want to be displayed using the text box, please insert positive values only. \n",
    "3. Click 'Run' and wait until the result is retrieved."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "import src.pipeline_analysis as pipeline_analysis\n",
    "import ipywidgets as widgets\n",
    "from IPython.display import display\n",
    "import pandas as pd\n",
    "\n",
    "limited_imbalance_tables = []\n",
    "limited_imbalance_tables_df = pd.DataFrame()\n",
    "def get_limited_imbalance_tables_df(limit):\n",
    "    global limited_imbalance_tables, limited_imbalance_tables_df\n",
    "    limited_imbalance_tables_df = pipeline_analysis.get_tables_read_write_frequency_df(limit)\n",
    "    limited_imbalance_tables = limited_imbalance_tables_df['Table'].tolist()\n",
    "    return limited_imbalance_tables_df\n",
    "\n",
    "widgets.interact_manual.opts['manual_name'] = 'Run'\n",
    "widgets.interact_manual(get_limited_imbalance_tables_df, limit= widgets.IntText(value=3))\n",
    ";"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Get the pipeline graph data of the table\n",
    "This will generate a pipeline graph file, in HTML format, under `pipeline_graph` directory. It may take sometime for this to run and generate.\n",
    "\n",
    "1. Choose the table of interest, the table that you are interested to explore further by displaying its pipeline graph.\n",
    "2. Click 'Run' and wait until the run is finished (indicated by non grayed-out box).\n",
    "3. Run the next cell to display the graph"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "def visualise_table_pipelines(table):\n",
    "    pipeline_analysis.display_pipelines_of_table(table)\n",
    "\n",
    "widgets.interact_manual(visualise_table_pipelines, table = widgets.Dropdown(options=limited_imbalance_tables+ [''], value='', description='Table:'))\n",
    ";"
   ],
   "outputs": [],
   "metadata": {
    "scrolled": true
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### Display the pipeline graph of the table\n",
    "Display the pipeline graph of the table. The thickness of the edges indicates the frequency compared to the rest of the edges in the current graph.\n",
    "\n",
    "1. Run the cell to display the pipeline graph of the table in the iFrame below\n",
    "2. You can click on the different nodes of the graph, each representing different tbales that are part of the pipeline of this table of interest. When you click on a node, it will display more information for this table."
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "from IPython.display import IFrame,HTML, display\n",
    "display(IFrame('./pipeline_graph/index.html', width=\"1000\", height=\"800\"))"
   ],
   "outputs": [],
   "metadata": {}
  }
 ],
 "metadata": {
  "environment": {
   "name": "common-cpu.m76",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/base-cpu:m76"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}